11 research outputs found

    A Recipe for Efficient SBIR Models: Combining Relative Triplet Loss with Batch Normalization and Knowledge Distillation

    Full text link
    Sketch-Based Image Retrieval (SBIR) is a crucial task in multimedia retrieval, where the goal is to retrieve a set of images that match a given sketch query. Researchers have already proposed several well-performing solutions for this task, but most focus on enhancing embedding through different approaches such as triplet loss, quadruplet loss, adding data augmentation, and using edge extraction. In this work, we tackle the problem from various angles. We start by examining the training data quality and show some of its limitations. Then, we introduce a Relative Triplet Loss (RTL), an adapted triplet loss to overcome those limitations through loss weighting based on anchors similarity. Through a series of experiments, we demonstrate that replacing a triplet loss with RTL outperforms previous state-of-the-art without the need for any data augmentation. In addition, we demonstrate why batch normalization is more suited for SBIR embeddings than l2-normalization and show that it improves significantly the performance of our models. We further investigate the capacity of models required for the photo and sketch domains and demonstrate that the photo encoder requires a higher capacity than the sketch encoder, which validates the hypothesis formulated in [34]. Then, we propose a straightforward approach to train small models, such as ShuffleNetv2 [22] efficiently with a marginal loss of accuracy through knowledge distillation. The same approach used with larger models enabled us to outperform previous state-of-the-art results and achieve a recall of 62.38% at k = 1 on The Sketchy Database [30]

    Semantic Sketch-Based Video Retrieval with Autocompletion

    Get PDF
    The IMOTION system is a content-based video search engine that provides fast and intuitive known item search in large video collections. User interaction consists mainly of sketching, which the system recognizes in real-time and makes suggestions based on both visual appearance of the sketch (what does the sketch look like in terms of colors, edge distribution, etc.) and semantic content (what object is the user sketching). The latter is enabled by a predictive sketch-based UI that identifies likely candidates for the sketched object via state-of-the-art sketch recognition techniques and offers on-screen completion suggestions. In this demo, we show how the sketch-based video retrieval of the IMOTION system is used in a collection of roughly 30,000 video shots. The system indexes collection data with over 30 visual features describing color, edge, motion, and semantic information. Resulting feature data is stored in ADAM, an efficient database system optimized for fast retrieval

    UMons at MediaEval 2015 Affective Impact of Movies Task including Violent Scenes Detection

    Get PDF
    ABSTRACT In this paper, we present the work done at UMons regarding the MediaEval 2015 Affective Impact of Movies Task (including Violent Scenes Detection). This task can be divided into two subtasks. On the one hand, Violent Scene Detection, which means automatically finding scenes that are violent in a set if videos. And on the other hand, evaluate the affective impact of the video, through an estimation of the valence and arousal. In order to offer a solution for both detection and classification subtasks, we investigate different visual and auditory feature extraction methods. An i-vector approach is applied for the audio, and optical flow maps processed through a deep convolutional neural network are tested for extracting features from the video. Classifiers based on probabilistic linear discriminant analysis and fully connected feed-forward neural networks are then used

    Enhanced Retrieval and Browsing in the IMOTION System

    No full text
    This paper presents the IMOTION system in its third version. While still focusing on sketch-based retrieval, we improved upon the semantic retrieval capabilities introduced in the previous version by adding more detectors and improving the interface for semantic query specication. In addition to previous year's system, we increase the role of features obtained from Deep Neural Networks in three areas: semantic class labels for more entry-level concepts, hidden layer activation vectors for query-by-example and 2D semantic similarity results display. The new graph-based result navigation interface further enriches the system's browsing capabilities. The updated database storage system ADAMpro designed from the ground up for large scale multimedia applications ensures the scalability to steadily growing collections

    The IMOTION System at TRECVID 2016: The Ad-Hoc Video Search Task

    No full text
    In this paper, we describe the details of our participation to the TRECVID Ad-Hoc Video Search (AVS) 2016 with the IMOTION system

    IMOTION — a content-based video retrieval engine

    No full text
    This paper introduces the IMOTION system, a sketch-based video retrieval engine supporting multiple query paradigms. For vector space retrieval, the IMOTION system exploits a large variety of low level image and video features, as well as high-level spatial and temporal features that can all be jointly used in any combination. In addition, it supports dedicated motion features to allow for the specification of motion within a video sequence. For query specification, the IMOTION system supports query-by-sketch interactions (users provide sketches of video frames), motion queries (users specify motion across frames via partial flow fields), query-by-example (based on images) and any combination of these, and provides support for relevance feedback

    iAutoMotion - an Autonomous Content-based Video Retrieval Engine

    No full text
    This paper introduces iAutoMotion, an autonomous video retrieval system that requires only minimal user input. It is based on the video retrieval engine IMOTION. iAutoMotion uses a camera to capture the input for both visual and textual queries and performs query composition, retrieval, and result submission autonomously. For the visual tasks, it uses various visual features applied to the captured query images; for the textual tasks, it applies OCR and some basic natural language processing, combined with object recognition. As the iAutoMotion system does not conform to the VBS 2016 rules, it will participate as unofficial competitor and serve as a benchmark for the manually operated systems

    IMOTION - Searching for Video Sequences using Multi-Shot Sketch Queries

    No full text
    This paper presents the second version of the IMOTION system, a sketch-based video retrieval engine supporting multiple query paradigms. Ever since, IMOTION has supported the search for video sequences on the basis of still images, user-provided sketches, or the specification of motion via flow fields. For the second version, the functionality and the usability of the system have been improved. It now supports multiple input images (such as sketches or still frames) per query, as well as the specification of objects to be present within the target sequence. The results are either grouped by video or by sequence and the support for selective and collaborative retrieval has been improved. Special features have been added to encapsulate semantic similarity
    corecore